From plane to hierarchy: Deformable Transformer for Remote Sensing Image Captioning

نویسندگان

چکیده

With the growth of remote sensing images, un-derstanding image content automatically has attracted many researchers' interests in deep learning for image. Inspired from natural captioning, model with CNN-RNN as backbone and supplemented by attention been widely used captioning. However, it is inefficient current layer to simultaneously mine hidden foreground background perform feature interactive learning. Meanwhile, new mainstream language recently surpassed traditional LSTM sentence generation. For solving above problems, this paper, we proposed a novel thought make flat images stereoscopic separating fore- background. Based on hierarchical informa-tion, designed Deformable Transformer equipped deformable scaled dot-product learn multi-scale through powerful ability. Evaluations are conducted Four classic captioning datasets. Compared state-of-the-art methods, our variant achieves higher accuracy.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...

متن کامل

Remote Sensing: From Image Processing to Spatio-temporal Processing

This paper gives a brief survey of remote sensing techniques from a viewpoint of pattern recongition and media understanding (PRMU). First we give a brief summary of remote sensing, and then introduce related work on both remote sensing image processing and some unique issues in remote sensing image processing. We moreover point out that the future direction of remote sensing is expected to be ...

متن کامل

Remote Sensing Image Processing

About SYNTHESIs This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. For more information visit www.morganclaypool.com SYNTHESIS LECTURES ON IMAGE, VIDEO & MULTIMEDIA PRO...

متن کامل

Learning to Guide Decoding for Image Captioning

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

سال: 2023

ISSN: ['2151-1535', '1939-1404']

DOI: https://doi.org/10.1109/jstars.2023.3305889